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Abstract 

Many behavioural equivalences or preorders for probabilistic processes involve a lifting operation 
that turns a relation on states into a relation on distributions of states. We show that several existing 
proposals for lifting relations can be reconciled to be different presentations of essentially the same lifting 
operation. More interestingly, this lifting operation nicely corresponds to the Kantorovich metric, a 
fundamental concept used in mathematics to lift a metric on states to a metric on distributions of states, 
besides the fact the lifting operation is related to the maximum flow problem in optimisation theory. 

The lifting operation yields a neat notion of probabilistic bisimulation, for which we provide logical, 
metric, and algorithmic characterisations. Specifically, we extend the Hennessy-Milner logic and the 
modal mu-calculus with a new modality, resulting in an adequate and an expressive logic for probabilistic 
bisimilarity, respectively. The correspondence of the lifting operation and the Kantorovich metric leads to 
a natural characterisation of bisimulations as pseudometrics which are post-fixed points of a monotone 
function. We also present an "on the fiy" algorithm to check if two states in a finitary system are 
related by probabilistic bisimilarity, exploiting the close relationship between the lifting operation and 
the maximum fiow problem. 

1 Introduction 

In the last three decades a wealth of behavioural equivalences have been proposed in concurrency theory. 
Among them, bisimilarity |43[ 148) is probably the most studied one as it admits a suitable semantics, an 
elegant co-inductive proof technique, as well as efficient decision algorithms. 

In recent years, probabilistic constructs have been proven useful for giving quantitative specifications 
of system behaviour. The first papers on probabilistic concurrency theory [25[ [5l |38] proceed by replacing 
nondeterministic with probabilistic constructs. The reconciliation of nondeterministic and probabilistic con- 
structs starts with gT] and has received a lot of attention in the literature [gTHMlHOllSgll^HTl lHl lg^HIl 
ini[S71lll[Tl[Iil[Il[T2]. 

We shall also work in a framework that features the co-existence of probability and nondeterminism. 
More specifically, we deal with probabilistic labelled transition systems (pLTSs) which are an extension 
of the usual labelled transition systems (LTSs) so that a step of transition is in the form s A, meaning 
that state s can perform action a and evolve into a distribution A over some successor states. In this setting 
state s is related to state t by a relation 7^, say probabilistic simulation, written s 72. t, if for each transition 
s A from s there exists a transition t Q from t such that Q can somehow simulate the behaviour of 
A according to TZ. To formalise the mimicking of A by 8, we have to lift 7?, to be a relation TZ'' between 
distributions over states and require A 72.^ Q. 

Various approaches of lifting relations have appeared in the literature; see e.g. [371 [54l [Ml [8j [12]. We 
will show that although those approaches appear different, they can be reconciled. Essentially, there is 
only one lifting operation, which has been presented in different forms. Moreover, we argue that the lifting 
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operation is interesting itself. This is justified by its intrinsic connection with some fundamental concepts 
in mathematics, notably the Kantorovich metric 1341 . For example, it turns out that our lifting of binary 
relations from states to distributions nicely corresponds to the lifting of metrics from states to distributions 
by using the Kantorovich metric. In addition, the lifting operation is closely related to the maximum flow 
problem in optimisation theory, as observed by Baier et al. [1]. 

A good scientific concept is often elegant, even seen from many different perspectives. Bisimulation is one 
of such concepts in the traditional concurrency theory, as it can be characterised in a great many ways such 
as fixed point theory, modal logics, game theory, coalgebras etc. We believe that probabilistic bisimulation is 
also one of such concepts in probabilistic concurrency theory. As an evidence, we will provide in this paper 
three characterisations, from the perspectives of modal logics, metrics, and decision algorithms. 

1. Our logical characterisation of probabilistic bisimulation consists of two aspects: adequacy and expres- 
sivity [SO]. A logic C is adequate when two states are bisimilar if and only if they satisfy exactly the 
same set of formulae in C. The logic is expressive when each state s has a characteristic formula (ps in C 
such that t is bisimilar to s if and only if t satisfies (ps ■ We will introduce a probabilistic choice modality 
to capture the behaviour of distributions. Intuitively, distribution A satisfies the formula Pi ■ if 
there is a decomposition of A into a convex combination some distributions, A = ^i^j Pi ■ and each 
Ai confirms to the property specified by (pi. When the new modality is added to the Hennessy-Milner 
logic [35] we obtain an adequate logic for probabilistic bisimilarity; when it is added to the modal 
mu-calculus |36| we obtain an expressive logic. 

2. By metric characterisation of probabilistic bisimulation, we mean to give a pseudometric such that 
two states are bisimilar if and only if their distance is when measured by the pseudometric. More 
specifically, we show that bisimulations correspond to pseudometrics which are post-fixed points of a 
monotone function, and in particular bisimilarity corresponds to a pseudometric which is the greatest 
fixed point of the monotone function. 

3. As to the algorithmic characterisation, we propose an "on the fly" algorithm that checks if two states 
are related by probabilistic bisimilarity. The schema of the algorithm is to approximate probabilistic 
bisimilarity by iteratively accumulating information about state pairs {s,t) where s and t are not 
bisimilar. In each iteration we dynamically constructs a relation TZ as an approximant. Then we verify 
if every transition from one state can be matched up by a transition from the other state, and their 
resulting distributions are related by the lifted relation TZ\ which involves solving the maximum flow 
problem of an appropriately constructed network, by taking advantage of the close relation between 
our lifting operation and the above mentioned maximum flow problem. 

Related work Probabilistic bisimulation was first introduced by Larsen and Skou [3 7) . Later on, it was 
investigated in a great many probabilistic models. An adequate logic for probabilistic bisimulation in a 
setting similar to our pLTSs has been studied in [321 SH] ■ It is also based on an probabilistic extension of the 
Hennessy-Milner logic. The main difference from our logic in Section [5. II is the introduction of the operator 
[•]p. Intuitively, a distribution A satisfies the formula [ip]p when the set of states satisfying ip is measured by 




A with probability at least p. So the formula [ip]p can be expressed by our logic in terms of the probabilistic 
choice ®jgjPi-</'i by setting / — {1,2}, pi = p, p2 = 1 — p, (fii ^ (p, and (p2 — true. When restricted to 
deterministic pLTSs (i.e., for each state and for each action, there exists at most one outgoing transition 
from the state), probabilistic bisimulations can be characterised by simpler forms of logics, as observed in 



An expressive logic for nonprobabilistic bisimulation has been proposed in [55 . In this paper we partially 
extend the results of |55) to a probabilistic setting that admits both probabilistic and nondeterministic choice. 
We present a probabilistic extension of the modal mu-calculus [33], where a formula is interpreted as the 
set of states satisfying it. This is in contrast to the probabilistic semantics of the mu-calculus as studied in 
[2211111112] where formulae denote lower bounds of probabilistic evidence of properties, and the semantics of 
the generalised probabilistic logic of [B] where a mu-calculus formula is interpreted as a set of deterministic 
trees that satisfy it. 




[Sa [161149]. 
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The Kantorovich metric has been used by van Breugel et al. for defining behavioural pseudometrics on 
fuUy probabihstic systems [GT] |64l |60] and reactive probabilistic systems [62l [63l [58l [59] ; and by Desharnais 
et al. for labelled Markov chains [T71 [H] and labelled concurrent Markov chains [TH]; and later on by 
Ferns et al. for Markov decision processes [221 [M]; and by Deng et al. for action- labelled quantitative 
transition systems jjj- One exception is [20) . which proposes a pseudometric for labelled Markov chains 
without using the Kantorovich metric. Instead, it is based on a notition of e-bisimulation, which relaxes 
the definition of probabilistic bisimulation by allowing small perturbation of probabilities. In this paper 
we are mainly interested in the correspondence of our lifting operation to the Kantorovich metric. The 
metric characterisation of probabilistic bisimulation in Section [5] is merely a direct consequence of this 
correspondence. 

Decision algorithms for probabilistic bisimilarity and similarity have been considered by Baier et al. in 
[2] and Zhang et al. in [68]. Their algorithms are global in the sense that a whole state space has to be 
fully generated in advance. In contrast, "on the fly" algorithms are local in the sense that the state space 
is dynamically generated which is often more efficient to determine that one state fails to be related to 
another. Our algorithm in Section [7] is inspired by [5] because we also reduce the problem of checking if 
two distributions are related by a lifted relation to the maximum flow problem of a suitable network. We 
generalise the local algorithm of checking nonprobabilistic bisimilarity \n\ [55] to the probabilistic setting. 

This paper provides a relatively comprehensive account of probabilistic bisimulation. Some of the results 
or their variants were mentioned previously in [7] [HI HOI 111] . Here they are presented in a uniform way and 
equipped with detailed proofs. 

Outline of the paper The paper proceeds by recalling a way of lifting binary relations from states to 
distributions, and showing its coincidence with a few other ways in Section[2J The lifting operation is justified 
in Section [3] in terms of its correspondence to the Kantorovich metric and the maximum fiow problem. In 
Section [4] we define probabilistic bisimulation and show its infinite approximation. In Section [5] we introduce 
a probabilistic choice modality, then extend the Hennessy-Milner logic and the modal mu-calculus so to 
obtain two logics that are adequate and expressive, respectively. In Section [6] we characterise probabilistic 
bisimulations as pseudometrics. In Section [7| we exploit the correspondence of our lifting operation to the 
maximum fiow problem, and present a polynomial time decision algorithm. Finally, Section [5| concludes the 
paper. 

2 Lifting relations 

In the probabilistic setting, formal systems are usually modelled as distributions over states. To compare 
two systems involves the comparison of two distributions. So we need a way of lifting relations on states to 
relations on distributions. This is used, for example, to define probabilistic bisimulation as we shall see in 
Section [4j A few approaches of lifting relations have appeared in the literature. We will take the one from 
[12], and show its coincidence with two other approaches. 

We first fix some notation. A (discrete) probability distribution over a set S" is a mapping A : — ?> [0, 1] 
with J2ses^(^) ~ 1- '^^^ support of A is given by [A] :— {s € S \ A(s) > 0}. In this paper we only 
consider finite state systems, so it suffices to use distributions with finite support; let 'D{S), ranged over 
by A, 9, denote the collection of all such distributions over S. We use s to denote the point distribution, 
satisfying s{t) — I ii t — s, and otherwise. If > and A^ is a distribution for each i in some finite index 
set /, then J^ieiPt ' '^i is given by 

If ^ 1 then this is easily seen to be a distribution in 'D{S). Finally, the product of two probability 

distributions A, 9 over S, T is the distribution A x 9 over S xT defined by (A x 0)(s, t) A(s) • 9(i). 
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Definition 2.1 Given two sets S and T and a relation UCSxT. Then U'^ C V{S) x V{T) is the smahest 
relation that satisfies: 

1 . sTZt implies 'sTV t 

2. Ai T^t 9. implies (X^jg/Pi ' ^i) (Sie/^'j ' where / is a finite index set and J^ieiPi = ^■ 
The lifting construction satisfies the following useful property whose proof is straightforward thus omitted. 

Proposition 2.2 Suppose TZ C S x S and J^i^iPi = 1- If iJ^ieiP^ ' ® then 9 = J^ieiPi ' ®^ 

some set of distributions &i such that A^ 7?.^ 8^. □ 

We now look at alternative presentations of Definition 12.11 The proposition below is immediate. 

Proposition 2.3 Let A and Q be distributions over S and T, respectively, and TZC S x T. Then A TZ^ Q 
if and only if A, Q can be decomposed as follows: 

1. A = J2ieiPi ' where / is a finite index set and J2ieiPi ~ ^ 

2. For each i £ / there is a state ti such that si TZ ti 

An important point here is that in the decomposition of A into Eie/ Pi ' states Si are not necessarily 

distinct: that is, the decomposition is not in general unique. Thus when establishing the relationship between 
A and O, a given state s in A may play a number of different roles. 

From Definition 12.11 the next two properties follows. In fact, they are sometimes used in the literature 
as definitions of lifting relations instead of being properties (see e.g. [Ml l37]). 

Theorem 2.4 1. Let A and Q be distributions over S and T, respectively. Then A 7?.^ if and only if 
there exists a weight function w : 5 x T — ^ [0, 1] such that 

(a) V.se5:Et6T^"(s>0 = A(s) 

(b) yteT:Y.s^sw(s,t)^Q{t) 

(c) V(s, t) &S xT : w{s, t)>0^ sTZt. 

2. Let A, be distributions over S and TZ is an equivalence relation. Then A TZ^ Q ii and only if 
A(C) = Q{C) for all equivalence class C E S/TZ, where A(C) stands for the accumulation probability 
Esse AW. 

Proof: 1. Suppose A TZ^ O. By Proposition 12.31 we can decompose A and Q such that A = 

J2ieiPi ' © — J2ieiPi ' '•ij ^^-d Si TZ ti for all i E I. We define the weight function w by letting 
w(s, t) = J2{Pi \ Si — s,ti — t,i E 1} for any s € S,t E T. This weight function can be checked to meet 
our requirements. 

(a) For any s e S', it holds that 

= Ete I = s,i e /} 
= A(.) 

(b) Similarly, we have Eses^(*'0 = ©(0- 

(c) For any s G S,t E T, it w{s, t) > then there is some i € I such that pi > 0, Si = s, and — t. 
It follows from Si TZ ti that s 7^ 
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(<J=) Suppose there is a weight function w satisfying the three conditions in the hypothesis. We construct 
the index set / — {{s, t) \ w{s, t) > 0, s £ S,t € T} and probabihties P(s,t) — ^{3, t) for each (s, t) G /. 

(a) It holds that A = t)eiP(s,t) " s because, for any s € S, 

iT,{s,t)eiP{s.t) ■s){s) = E(.,t)e/^i'(s>0 

= '£{w{s,t)\wis,t)>0,tGT} 

= j:{wis,t)\teT} 
= Ms) 

(b) Similarly, we have 9 = J2{s t)£i '^(^^ t) ' t- 

(c) For each (s, t) G /, we have w{s, t) > 0, which implies sTZt. 

Hence, the above decompositions of A and Q meet the requirement of the lifting A 7^^ Q. 

2. (=J>) Suppose A 'R) Q. By Proposition 12. 3[ we can decompose A and Q such that A = X)ie/^^« ' 
Q = J2i£iPi ■ ^^'^ Si TZ ti for all i € I. For any equivalence class C € S/TZ, we have that 

A(C)-E.ecA(s) = E.ecEte Ue/,5. = 4 
= Efe U e /, Si e C} 

= e(c) 

where the equality in the third line is justified by the fact that G C iff G C since Si TZ ti and 

C e s/n. 

(-4=) Suppose, for each equivalence class C E S/TZ, it holds that A(C) = 0(C). We construct the index 
set / = {(s,<) \ sTZt and s,t E S} and probabilities P{s,t) — ^l[s\^) foi" ^ach {s,t) G /, where [s]iz 
stands for the equivalence class that contains s. 

(a) It holds that A = E(s t)eiP{s,t) ' * because, for any s' G S, 

iJ2{s,t)GlP{s,t) ■S)is') = E(s'.t)e7P(^',*) 

- E{^^^l^'7^^, iG^} 

= A(s') 

(b) Similarly, we have 6 = J2{s,t)ei P{s.t) ' t- 

(c) For each (s, t) G /, we have sTZt. 

Hence, the above decompositions of A and Q meet the requirement of the lifting A TZ'^ Q. 

□ 



3 Justifying the lifting operation 

In our opinion, the lifting operation given in Definition 12.11 is not only concise but also on the right track. 
This is justified by its intrinsic connection with some fundamental concepts in mathematics, notably the 
Kantorovich metric. 
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3.1 Justification by the Kantorovich metric 

We begin with some historical notes. The transportation problem has been playing an important role in 
linear programming due to its general formulation and methods of solution. The original transportation 
problem, formulated by the French mathematician G. Monge in 1781 |45j , consists of finding an optimal way 
of shovelling a pile of sand into a hole of the same volume. In the 1940s, the Russian mathematician and 
economist L.V. Kantorovich, who was awarded a Nobel prize in economics in 1975 for the theory of optimal 
allocation of resources, gave a relaxed formulation of the problem and proposed a variational principle for 
solving the problem |34]. Unfortunately, Kantorovich's work went unrecognized during a long period of time. 
The later known Kantorovich metric has appeared in the literature under different names, because it has been 
rediscovered historically several times from different perspectives. Many metrics known in measure theory, 
ergodic theory, functional analysis, statistics, etc. are special cases of the general definition of the Kantorovich 
metric [65] . The elegance of the formulation, the fundamental character of the optimality criterion, as well as 
the wealth of applications, which keep arising, place the Kantorovich metric in a prominent position among 
the mathematical works of the 20th century. In addition, this formulation can be computed in polynomial 
time j47j . which is an appealing feature for its use in solving applied problems. For example, it is widely 
used to solve a variety of problems in business and economy such as market distribution, plant location, 
scheduling problems etc. In recent years the metric attracted the attention of computer scientists [9]: it has 
been used in various different areas in computer science such as probabilistic concurrency, image retrieval, 
data mining, bioinformatics, etc. 

Roughly speaking, the Kantorovich metric provides a way of measuring the distance between two distri- 
butions. Of course, this requires first a notion of distance between the basic elements that are aggregated 
into the distributions, which is often referred to as the ground distance. In other words, the Kantorovich 
metric defines a "lifted" distance between two distributions of mass in a space that is itself endowed with 
a ground distance. There are a host of metrics available in the literature (see e.g. |26j ) to quantify the 
distance between probability measures; see |52j for a comprehensive review of metrics in the space of prob- 
ability measures. The Kantorovich metric has an elegant formulation and a natural interpretation in terms 
of the transportation problem. 

We now recall the mathematical definition of the Kantorovich metric. Let (X, m) be a separable metric 
space. (This condition will be used by Theorem 13.41 below.) 

Definition 3.1 Given any two Borel probability measures A and on X, the Kantorovich distance between 
A and Q is defined by 



K{\ 6) = sup 



/dA - / fdQ 



where || • || is the Lipschitz semi-norm defined by ||/|| = sup^.,^^ ^'^^m(x'!y)^^ ^'^^ ^ function / : X — > R with 
being the set of all real numbers. 



ll/ll <1 

m.{x,y) 



The Kantorovich metric has an alternative characterisation. We denote by P(X) the set of all Borel 
probability measures on X such that for all z G X, if A G 'P{X) then j m{x ^ z) /S.{x) < oo. We write 
M(A, 0) for the set of all Borel probability measures on the product space X x X with marginal measures 
A and 6, i.e. ifrGM(A,e) then J^^^ dr{x,y) = dA{x) and J^^^ dT{x,y) = de{y) hold. 

Definition 3.2 For A, e P{X), we define the metric L as follows: 

L{A, 9) = inf m{x, y)dT{x, y) : T e Af (A, e)| . 

Lemma 3.3 If {X, m) is a separable metric space then K and L are metrics on P{X). □ 
The famous Kantorovich- Rubinstein duality theorem gives a dual representation of K in terms of L. 

Theorem 3.4 [Kantorovich-Rubinstein [35 ] If {X, m) is a separable metric space then for any two distri- 
butions A, 6 G P(X) we have K{A, 9) = L(A, 0). □ 
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In view of the above theorem, many papers in the hterature directly take Definition 13.21 as the definition 
of the Kantorovich metric. Here we keep the original definition, but it is helpful to understand K by using L. 
Intuitively, a probability measure T S M(A, Q) can be understood as a transportation from one unit mass 
distribution A to another unit mass distribution Q. If the distance m{x,y) represents the cost of moving 
one unit of mass from location x to location y then the Kantorovich distance gives the optimal total cost of 
transporting the mass of A to 0. We refer the reader to |66| for an excellent exposition on the Kantorovich 
metric and the duality theorem. 

Many problems in computer science only involve finite state spaces, so discrete distributions with finite 
supports are sometimes more interesting than continuous distributions. For two discrete distributions A and 
Q with finite supports {a;i, a;„} and {j/i, ?//}, respectively, minimizing the total cost of a discretised 
version of the transportation problem reduces to the following linear programming problem: 

minimize X^Li ^ixt,yj)m{xi,y]) 
subject to • VI < i < n : ^i^i^ Uj) = M^i) 

• yi <i <n,l < j <l : T{xi, y^) > 0. 

Since ([1]) is a special case of the discrete mass transportation problem, some well-known polynomial time 
algorithm like [47 can be employed to solve it, which is an attractive feature for computer scientists. 



Recall that a pseudometric is a function that yields a non-negative real number for each pair of elements 
and satisfies the following: m(s,s) = 0, m{s,t) = m{t,s), and m{s,t) < m{s,u) +m{u,t), for any s,t ^ S. 
We say a pseudometric m is 1-bounded if m{s, t) < 1 for any s and t. Let A and & be distributions over a 
finite set S of states. In [5T] a 1-bounded pseudometric to on 5 is lifted to be a 1-bounded pseudometric m 
on 'D(S) by setting the distance to(A, O) to be the value of the following linear programming problem: 

maximize X]se5(^(^) ~ Q{s))xs 

subject to • Vs, t € S : Xg — Xt < m{s, t) (2) 
• Vs e 5 : < < 1. 

This problem can be dualised and then simplified to yield the following problem: 

minimize Y.s,tesyst'm{s,t) 

subject to • Vs G 5 : Y^teS y^t = A(s) 

.Vfe5:E.65 2/«t = 0W 
• Vs,t e S : yst > 0. 

Now ([3]) is in exactly the same form as ([1]). 

This way of lifting pseudometrics via the Kantorovich metric as given in ([3|) has an interesting connection 
with the lifting of binary relations given in Definition 12.11 

Theorem 3.5 Let i? be a binary relation and to a pseudometric on a state space S satisfying 

sRt iff m(s,t) = (4) 

for any s,t G S. Then it holds that 

A i?t e iff TO(A,e) = 

for any distributions A, 9 G T>{S). 

Proof: Suppose A iJ^ Q. From Theorem 12.4( 1) we know there is a weight function w such that 
1. V,sG5:Et6S«^(s,i)=A(s) 
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2. ^tes ■.j:sesHs,t)^e{t) 

3. Vs,t e S" : w{s,t) >0^ s Rt. 

By substituting w{s,t) for ys^t in ©J the three constraints there can be satisfied. For any s,t G S' we 
distinguish two cases: 

1. either w{s, t) = 

2. or w(s, i) > 0. In this case we have s R t, which imphes m(s, i) = by ([5]). 

Therefore, we always have w{s,t)m(s,t) ~ for any s,t Cz S. Consequently, w{s,t)m(s,t) = and 

the optimal value of the problem in ([3]) must be 0, i.e. m(A, O) — 0, and the optimal solution is determined 
by w. 

The above reasoning can be reversed to show that the optimal solution of ([3]) determines a weight function, 
thus m(A,e) = implies A R'^ Q. □ 

The above property will be used in SectionlHlto give a metric characterisation of probabilistic bisimulation 
(cf. Theorem EH). 

3.2 Justification by network flow 

The lifting operation discussed in Section [5] is also related to the maximum flow problem in optimisation 
theory. This was already observed by Baier et al. in [5]. 

We briefly recall the basic definitions of networks. More details can be found in e.g. [2T]. A network is 
a tuple J\f = {N, E, _L, T, c) where {N, E) is a finite directed graph (i.e. TV is a set of nodes and E C N x N 
is a set of edges) with two special nodes _L (the source) and T (the sink) and a capability c, i.e. a function 
that assigns to each edge {v,w) d E a non-negative number c(v,w). A flow function f for A/" is a function 
that assigns to edge e a real number /(e) such that 

• < /(e) < c(e) for aU edges e. 

• Let in{v) be the set of incoming edges to node v and out{v) the set of outgoing edges from node v. 
Then, for each node v E N\{±, T}, 

E /(^) = E /(^)- 

e£in(v) out{v) 

The flow F{f) of / is given by 

F{f) = E /(^)- E /(^)- 

The maximum flow in Af is the supremum (maximum) over the fiows F{f), where / is a flow function in Af. 

We will see that the question whether A TZ^ Q can be reduced to a maximum flow problem in a suitably 
chosen network. Suppose TZC S x S and A, 9 G 25(5'). Let S' — {s' \ s E S} where s' are pairwise distinct 
new states, i.e. s' E S for all s E S. We create two states _L and T not contained in 5 U 5' with ± 7^ T. We 
associate with the pair (A, Q) the following network A/'(A, 8, TZ). 

• The nodes are A^ = U S" U {_L, T}. 

• The edges are E = {(s, t') \ (s, t) eU} U {(±, s) | s e S*} U {(s', T) \ s e S}. 

• The capability c is deflned by c(_L, s) = A(s), c{t' , T) = Q{t) and c{s,t') = 1 for all s,t E S. 
The next lemma appeared as Lemma 5.1 in [2]. 

Lemma 3.6 Let S* be a finite set. A, 6 e 'D{S) and 7?.C S x S. The following statements are equivalent. 
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1. There exists a weight function w for (A, 9) with respect to TZ. 



2. The maximum flow in A/'(A, 0, TZ) is 1. 



□ 



Since the hfting operation given in Definition 12. f I can also be stated in terms of weight functions, we 
obtain the following characterisation using network flow. 

Theorem 3.7 Let S" be a finite set, A, 6 e 'D{S) and 7^C SxS. Then A 7^■'■ 6 if and only if the maximum 
flow in 7V(A,e,7e) is 1. 



The above property will play an important role in Section[7]to give an "on the fly" algorithm for checking 
probabilistic bisimilarity. 

4 Probabilistic bisimulation 

With a solid base of the lifting operation, we can proceed to define a probabilistic version of bisimulation. 
We start with a probabilistic generalisation of labelled transition systems (LTSs). 

Definition 4.1 A probabilistic labelled transition system (pLTS j3 is a triple 
(5, Act, -4), where 

1. S' is a set of states; 

2. Act is a set of actions; 

3. — ?> C 5 X Act X T>{S) is the transition relation. 

As with LTSs, we usually write s A in place of (s,a. A) e — A pLTS is finitely branching if for each 
state s & S the set {(a. A) | s A, a e Act, A G 'D{S)} is finite; if moreover S is finite, then the pLTS is 
finitary. 

In a pLTS, one step of transition leaves a single state but might end up in a set of states; each of them 
can be reached with certain probability. An LTS may be viewed as a degenerate pLTS, one in which only 
point distributions are used. 

Let s and t are two states in a pLTS, we say t can simulate the behaviour of s if the latter can exhibit 
action a and lead to distribution A then the former can also perform a and lead to a distribution, say 
Q, which can mimic A in successor states. We are interested in a relation between two states, but it is 
expressed by invoking a relation between two distributions. To formalise the mimicking of one distribution 
by the other, we make use of the lifting operation investigated in Section [21 

Definition 4.2 A relation T^C 5 x S* is a probabilistic simulation ii s TZ t implies 

• if s A then there exists some 9 such that t 9 and A 7?.^ 9. 

If both TZ and TZ^^ are probabilistic simulations, then 7?. is a probabilistic bisimulation. The largest proba- 
bilistic bisimulation, denoted by ~, is called probabilistic bisimilarity. 

As in the nonprobabilistic setting, probabilistic bisimilarity can be approximated by a family of induc- 
tively defined relations. 

Definition 4.3 Let S be the state set of a pLTS. We define: 

^Essentially the same model has appeared in the literature under different names such as NP-sy stems 1301 . probabilistic 
processes 1311 . simple probabilistic automata 1531 . probabilistic transition systems |32| etc. F\irthermore, there are strong 
structural similarities with Markov Decision Processes |51l I15| . 




□ 
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• s ^n+i t, for n > 0, if 

1. whenever s A, there exists some Q such that t Q and A ^J, 6; 

2. whenever t 6, there exists some A such that s A and A 9. 

• r\n>a 

In general, ~ is a strictly finer relation than ~(^. However, the two relations coincide when limited to finitely 
branching pLTSs. 

Proposition 4.4 On finitely branching pLTSs, coincides with ^. 

Proof: It is trivial to show by induction that s ^ t implies s t for all n > 0, thus s t. 

Now we show that is a bisimulation. Suppose s t and s -S-> A. We have to show that there is 
some 6 with t -S^ e and A --^ 6. Consider the set 

T {6 I t ^ 6 A A 9^^, 9}. 

For each 9 G T, we have A 7^ J, 9, which means that there is some uq > with A ^l^^ 9. Since t is finitely 
branching, T is a finite set. Let N = max{nQ \ 9 G T}. It holds that A 7^]^ 9 for all 9 G T, since by a 
straightforward induction on m we can show that s t implies s t for all m, n > with n > m. By 
the assumption s t we know that s ^n+i t. It follows that there is some 9 with t ^ 9 and A 9, 
so 9 T and hence A 9. By symmetry we also have that if t 9 then there is some A with s A 
and A 9. □ 

Proposition 14.41 has appeared in [T|; here we have given a simpler proof. 



5 Logical characterisation 

Let £ be a logic. We use the notation C{s) to stand for the set of formulae that state s satisfies. This induces 
an equivalence relation on states: s t iff C{s) — C{t). Thus, two states are equivalent when they satisfy 
exactly the same set of formulae. 

In this section we consider two kinds of logical characterisations of probabilistic bisimilarity. 

Definition 5.1 [Adequacy and expressivity] 

1. C is adequate w.r.t. if for any states s and 

s =^ t iff s ^t. 

2. C is expressive w.r.t. if for each state s there exists a characteristic formula Lps ^ C such that, for 
any states s and t, 

t LPs iff s ^ t. 

We will propose a probabilistic extension of the Hennessy-Milner logic, showing its adequacy, and then a 
probabilistic extension of the modal mu-calculus, showing its expressivity. 
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5.1 An adequate logic 

We extend the Hennessy-Milner logic by adding a probabilistic choice modality to express the bebaviour of 
distributions. 

Definition 5.2 The class £ of modal formulae over Act, ranged over by <y9, is defined by the following 
grammar: 

LP := T I (^1 A (^2 I {a)i^ I "'V 

We call if a state formula and a distribution formula. Note that a distribution formula ip only appears as 
the continuation of a diamond modality {a)ip. We sometimes use the finite conjunction Aie/ ^ syntactic 
sugar. 

The satisfaction relation |=C S* x £ is defined by 

• s h T for all s e S. 

• s \= ipi A (p2 a s \^ ipi for 1 = 1,2. 

• s 1= {a)ijj if for some A G V{S), s A and A ^ ?/;. 

• s ^ -^ip if it is not the case that s \^ ip. 

• A ^ ®i(ziPi-'-Pi if there are A^ e for alH G /, t £ [A^], with t\= (pi, such that A = YlieiPi' ^i- 

With a slight abuse of notation, we write A |= t/; above to mean that A satisfies the distribution formula ip. 
The introduction of distribution formula distinguishes L from other probabilistic modal logics e.g. [551 fiU] . 
It turns out that C is adequate w.r.t. probabilistic bisimilarity. 

Theorem 5.3 [Adequacy] Let s and t be any two states in a finitely branching pLTS. Then s ~ i if and 
only if s t. 

Proof: Suppose s ^ t, we show that s\=Lp<!=>t\=(p hy structural induction on ip. 

• Let s ^ T, we clearly have t \= T. 

• Let 8 \= (pi A ip2- Then s |= ipi for i — 1,2. So by induction t ^ ipi, and we have t \= ipi A ip2. By 
symmetry we also have t \= (pi A ip2 implies s \^ ipi A ip2. 

• Let s \= -lip. So s ^ ip, and by induction we have t ^ tp. Thus t \= -iLp. By symmetry we also have 

ip implies s ^ (/j. 

• Let s 1= (a) 0,,gjPi • ^Pt. Then s A and A ^ ®i^iPi ■ for some A. So A = J^ieiPt ' ^^^d 
for alH G / and s' G [A^] we have s' ipi. Since s ^ t, there is some with t Q and A q 
By Proposition 12.21 we have that Q = J^ieiPi ' ®« ^^'^ ^* follows that for each t' G [0i] 
there is some s' G [A^] with s' ^ t' . So by induction we have t' |= </Ji for all t' G [0^] with i € I. 
Therefore, we have Q \= ©jg/Pi • ^Pi- It follows that t \= (a) ©jg/Pi • ^Pi- By symmetry we also have 
t h (a) ®teiP^ -fi^ s\= (a) 0,e/P* • ft- 

We show that the relation is a probabilistic bisimulation. Suppose s t and s A. We have 
to show that there is some Q with t Q and A {—^Y Q. Consider the set 

T {9 I ^ ^ e A e = J2 ^(*') • ^ 3''^' ^ TAl, 3i' G [9,.] : s' t'} 

For each 9 G T, there must be some Sg, G [A] and t'^ G [©s^J such that (i) either there is a formula (pQ 
with s'q 1= (y3e but t'g ^ (^e (h) or there is a formula (/3q with t^ |= (/Sq but Sq ^ ipQ. In the latter case we 
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set (/3e — "'V'e ^^^"^ return back to the former case. So for each s' G [A] it holds that s' \= A{eeT|s' =s'} 'v'© 
and for each 9 e T with Sq = s' there is some G [©s'l with t'^ ^ A{eGT|s^=s'} L*^* 

^:=(a) A(.')- A 

It is clear that s ^ (p, hence i ^ (/s by s t. It follows that there must be a 0* with t -S-s> 0*, 
0* = Z^s'gTA] ^('S') • ©s' for each s' e 1^1,1' 6 [6*,] we have t' \= A{eeT|s^=s'} fo- This means that 
8* ^ T and hence for each s' G rA],t' G [6*,] we have s' ='~ t' . It follows that A {='')'^ 8*. By symmetry 
all transitions of t can be matched up by transitions of s. □ 

5.2 An expressive logic 

We now add the probabilistic choice modality introduced in Section fS.ll to the modal mu-calculus, and show 
that the resulting probabilistic mu-calculus is expressive w.r.t. probabilistic bisimilarity. 

5.2.1 Probabilistic modal mu-calculus 

Let Var be a countable set of variables. We define a set £p of modal formulae in positive normal form given 
by the following grammar: 

ip T I _L I (a}-0 | [a]"!^ \ (pi /\ i~p2 \ '^2 \ X \ ^.X.ip \ vX.ip 

where a G Act, / is a finite index set and X^ie/P'' ~ 1- H^re we still write ip for a state formula and V' 
a distribution formula. Sometimes we also use the finite conjunction Aie/ 'Pi ^^^"^ disjunction \J^^^ipi. As 
usual, we have A^gg "y^i = T and Vig0 '■Pi = -L- 

The two fixed point operators fiX and vX bind the respective variable X. We apply the usual terminology 
of free and bound variables in a formula and write fv{ip) for the set of free variables in ip. 

We use environments, which binds free variables to sets of distributions, in order to give semantics to 
formulae. We fix a finitary pLTS and let S be its state set. Let 

Em = {p \ p: Var ^ V{S) } 

be the set of all environments and ranged over by p. For a set V ^ S and a variable X G Var, we write 
p[X I—)- V] for the environment that maps XtoV and Y to p{Y) for d\\Y ^ X. 

The semantics of a formula p can be given as the set of states satisfying it. This entails a semantic 
functional [ ] : £^ — >■ Env — ;> V{S) defined inductively in Figure [1] where we also apply [ ] to distribution 
formulae and [■(/'] is interpreted as the set of distributions that satisfy ip- As the meaning of a closed formula 
ip does not depend on the environment, we write [(/?] for [(^]^ where p is an arbitrary environment. 

The semantics of probabilistic modal mu-calculus (pMu) is the same as that of the modal mu-calculus [36j 
except for the probabilistic choice modality which are satisfied by distributions. The characterisation of least 
fixed point formula fiX.p and greatest fixed point formula vX.p follows from the well-known Knaster-Tarski 
fixed point theorem |56j . 

We shall consider (closed) equation systems of formulae of the form 

E:Xi = pi 

where Xi,...,Xn are mutually distinct variables and pi,...,p„ are formulae having at most Xi,...,X„ as 
free variables. Here E can be viewed as a function E : Var £^ defined by E{Xi) — pi for i — \, n and 
E{Y) ~ Y for other variables Y G Var. 
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[HV-lp 

[M^-^lp 



s 



{s e S* I 3A : s A 
{ s e S* I VA : s ^ A 

p[Xv^V] 

{agp(5) I A = e,g,p. ■ 



} 



} 



A, A 'iiel.'ite [A,] -.te [^^]^ 



Figure 1: Semantics of probabilistic modal mu-calculus 



An environment p is a solution of an equation system i? if Vi : p{Xi) = [(/?i]p. The existence of solutions 
for an equation system can be seen from the following arguments. The set Env, which includes all candidates 
for solutions, together with the partial order < defined by 

p<p'iSyXe Var : p{X) C p'{X) 

forms a complete lattice. The equation functional £ : Env — )• Env given in the A-calculus notation by 

£:=Xp.XX.[EiX)l 

is monotonic. Thus, the Knaster-Tarski fixed point theorem guarantees existence of solutions, and the largest 
solution 

PE ■■=\JiP I P<£{P)} 
5.2.2 Characteristic equation systems 

As studied in ^55] , the behaviour of a process can be characterised by an equation system of modal formulae. 
Below we show that this idea also applies in the probabilistic setting. 

Definition 5.4 Given a finitary pLTS, its characteristic equation system consists of one equation for each 
state si, Sn € S. 

E : Xs^ = ipsi 

where 

( f\ {a)X^) A ( /\ H V Xa) (5) 

s-S-S>A aeAct s-S-!>A 

with Xa :=©,gpAl A(s) -X,. 

Theorem 5.5 Suppose £' is a characteristic equation system. Then s ^ t if and only if i S pe{Xs). 

Proof: (■^) Let 7^= { (s,i) | i € Pe{Xs) ]■ We first show that 

© e [^aI^^ implies A 7^t 9. (6) 

Let A = 0jgjPj • s7, then Xa = ®ieiPi ' Xs,- Suppose 6 G [^aJ^^- We have that 9 = 0jgjPi • 9^ and, 
for alH e / and t' £ [9^] , that t' e [Xsjp^ , i.e. s, U t' . It follows that s? U'^ 9, and thus A Tl'^ 9. 
Now we show that 7?. is a bisimulation. 
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1. Suppose s nt and s ^ A. Then t e Pe{Xs) = bJp^- It follows from © that t E [{a)XA]p^- So 
there exists some 9 sueh that t 9 and 9 G [-^^aI^^- Now we apply (|6]). 

2. Suppose sTZt a,iidt Q. Then t e pe{Xs) = [Vsjp^- It follows from (0 that t e [[a] Vs-2^a ^a]- 
Notice that it must be the case that s can enable action a, otherwise, t e [[aJ-L]^^ and thus t cannot 
enable a either, in contradiction with the assumption t 9. Therefore, 9 G [ Vs-2^a XaJp-^, which 
implies 9 G [-''^Alpg for some A with s A. Now we apply (jH). 

We define the environment by 

It sufficies to show that pr^ is a post-fixed point of £, i.e. 

P~ < £{P^) (7) 

because in that case we have p^ < pE, thus s ^ t implies t G pn^{Xs) which in turn implies t G psiXs). 
We first show that 

A e imphcs 9 G [XaJp^ ■ (8) 

Suppose A 9, by Proposition 12 . 31 we have that (i) A = 0ig/Pi • si, (n) 9 = 0i£/Pi • ^i, (in) Si ^ ti for 
all I G /. Wc know from (iii) that ti G [-'^^sil^ • Using (ii) we have that 9 G • ^sjp ■ Using (i) we 

obtain 9 G [XaIp^. 

Now we are in a position to show ([7]). Suppose t G pr^{Xs). We must prove that t G \}Ps\p , i-e. 

tG( fl [{a)XA]pJn{ fl [H V XaJpJ 

s ^-A aGAct s >A 

by ([5|). This can be done by showing that t belongs to each of the two parts of this intersection. 

1. In the first case, we assume that s A. Since s ^ there exists some 9 such that t 9 and 
A e. By dU, we get 9 G [^A]p^. It follows that t G [{a)XA]p^- 

2. In the second case, we suppose t 9 for any action a G Act and distribution 9. Then by s ~ i 
there exists some A such that s A and A 9. By we get 9 G [-^^Alp^- As a consequence, 
t G [[a] Vs-S-j^A XaJp ■ Since this holds for arbitrary action a, our desired result follows. 



□ 



5.2.3 Characteristic formulae 



So far we know how to construct the characteristic equation system for a finitary pLTS. As introduced in 
[46) , the three transformation rules in Figure [5] can be used to obtain from an equation system E a formula 
whose interpretation coincides with the interpretation of Xi in the greatest solution of E. The formula thus 
obtained from a characteristic equation system is called a characteristic formula. 

Theorem 5.6 Given a characteristic equation system E, there is a characteristic formula ips such that 
PsiXs) = I'^s] for any state s. □ 

The above theorem, together with the results in Section 15.2.21 gives rise to the following corollary. 
Corollary 5.7 For each state s in a finitary pLTS, there is a characteristic formula (fs such that s ~ i iff 

tel^sl □ 



14 



1. Rule 1: E ^ F 



2. Rule 2: E 

= (fi G : Xi ^ (pi[ipn/Xn] H : Xi = (fi 

= Vn-l Xn-1 — (pn-l[(Pn/ Xn] Xn-1 = fn-l 

= iyXn-<fn Xn = (fin 

Figure 2: Transformation rules 

6 Metric characterisation 

In the definition of probabilistic bisimulation probabilities are treated as labels since they are matched only 
when they are identical. One may argue that this does not provide a robust relation: Processes that differ 
for a very small probability, for instance, would be considered just as different as processes that perform 
completely different actions. This is particularly relevant to many applications where specifications can be 
given as perfect, but impractical processes and other, practical processes are considered acceptable if they 
only differ from the specification with a negligible probability. 

To find a more fiexible way to differentiate processes, researchers in this area have borrowed from math- 
ematics the notion of metrio A metric is defined as a function that associates a distance with a pair of 
elements. Whereas topologists use metrics as a tool to study continuity and convergence, we will use them 
to provide a measure of the difference between two processes that are not quite bisimilar. 

Since different processes may behave the same, they will be given distance zero in our metric semantics. 
So we are more interested in pseudometrics than metrics. 

In the rest of this section, we fix a finite state pLTS (S*, Act, — >) and provide the set of pseudometrics 
on S with the following partial order. 

Definition 6.1 The relation ^ for the set A4 of 1-bounded pseudometrics on S is defined by 

mi ^ rn2 if Vs,t : mi{s,t) > m2{s,t). 

Here we reverse the ordering with the purpose of characterizing bisimilarity as the greatest fixed point (cf: 
CoroUaryETOl). 

Lemma 6.2 [M., <) is a complete lattice. 

Proof: The top element is given by Vs, t : T(s, t) = 0; the bottom element is given by _L(s, i) = 1 if s 7^ 
otherwise. Greatest lower bounds are given by {\~\X){s,t) — sup{m(s,i) | m S X} for any X C M.. Finally, 
least upper bounds are given by |J X = P| {m e M. \ Vm' E X : m' < m}. □ 

Definition 6.3 m G is a state-metric if, for all e G [0, 1), m{s,t) < e implies: 

• if s A then there exists some A' such that t A' and m(A, A') < e 

where the lifted metric m was defined in ^ via the Kantorovich metric. Note that if m is a state-metric 
then it is also a metric. By m{s, t) < e we have m{t, s) < e, which implies 

• if t A' then there exists some A such that s A and rh{A' , A) < e. 

^For simplicity, in this section we use the term metric to denote both metric and pseudometric. All the results are based on 
pseudometrics. 



3. Rule 3: E ^ H if X^ ^ 



E : Xi 



F : X^ 



Xn-l — Xn-1 
Xn — ^Pn Xn 
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In the above definition, we prohibit e to be 1 because we use 1 to represent the distance between any two 
incomparable states including the case where one state may perform a transition and the other may not. 
The greatest state-metric is defined as 



I |{r7i G I m is a state- metric}. 



It turns out that state-metrics correspond to bisimulations and the greatest state-metric corresponds to 
bisimilarity. To make the analogy closer, in what follows we will characterize rrimax as a fixed point of a 
suitable monotone function on Ai . First we recall the definition of Hausdorff distance. 

Definition 6.4 Given a 1-bounded metric d on Z, the Hausdorff distance between two subsets X,Y oi Z 
is defined as follows: 

Hd{X,Y) = maxjsup inf (i(a:, y),sup inf d{y,x)} 
where inf = 1 and sup = 0. 

Next we define a function F on by using the Hausdorff distance. 
Definition 6.5 Let der{s,a) — {A | s A}. F{m) is a pseudometric given by: 

F{m){s,t) — swp {Hrn{der{s, a), der{t, a))}. 

a 6 Act 

Thus we have the following property. 
Lemma 6.6 For all e e [0, 1), F{m){s,t) < e if and only if: 

• if s A then there exists some A' such that t A' and to(A, A') < e; 

• at A' then there exists some A such that s -2-> A and m(A', A) < e. □ 
The above lemma can be proved by directly checking the definition of F, as can the next lemma. 

Lemma 6.7 to is a state- metric if and only if m ^ F{m). □ 
Consequently we have the following characterisation: 

rrimax = \_\{rn e M\m< F{m)}. 
Lemma 6.8 F is monotone on A^. □ 

Because of Lemma l6.2l and l6.81 we can apply Knaster-Tarski fixed point theorem, which tells us that rrimax 
is the greatest fixed point of F. Furthermore, by Lemma 16.71 we know that rrimax is indeed a state-metric, 
and it is the greatest state-metric. 

We now show the correspondence between state-metrics and bisimulations. 

Theorem 6.9 Given a binary relation TZ and a pseudometric m £ A( on a finite state pLTS such that 

m(st) = l '^'^^ (9) 
^ ' ^ [1 otherwise. 

Then TZ is a probabilistic bisimulation if and only if to is a state-metric. 



16 



Proof: The result can be proved by using Theoreni l3.5[ which in turn reUes on Theorem 12.41 (IV Below we 
give an alternative proof that uses Theorem 12.41 (2) instead. 

Given two distributions A, A' over S, let us consider how to compute m{A, A') if TZ is an equivalence 
relation. Since S is finite, we may assume that Vi, G S/TZ are all the equivalence classes of S under 
TZ. If s,t G 1^ for some i G l..n, then m(s,i) — 0, which implies Xg = xt by the first constraint of So 
for each i G l..n there exists some Xi such that Xi — Xs for all s G 1^. Thus, some summands of ^ can be 
grouped together and we have the following linear program: 

5] (A(F,)-A'(T/.))x. (10) 

with the constraint Xi — Xj < 1 for any i, j G l..n with i ^ j. Briefly speaking, if 7Z is an equivalence relation 
then m(A, A') is obtained by maximizing the linear program (jlOp . 

(=>) Suppose 7?. is a bisimulation and m{s,t) = 0. From the assumption in ^ we know that TZ is an 
equivalence relation. By the definition of m we have s TZ t. If s A then t A' for some A' such that 
ATZ"! A'. To show that m is a state-metric it suffices to prove m{A, A') = 0. We know from A TZ'' A' 
and Theorem 12.41 (2*1 that A{Vi) = A'{Vi), for each i G l..n. It follows that pO|) is maximized to be 0, thus 
m(A, A') = 0. 

(<;=) Suppose m is a state-metric and has the relation in ©. Notice that TZ is an equivalence relation. 
We show that it is a bisimulation. Suppose s TZ t, which means m(s, t) =0. If s A then t A' for 
some A' such that m(A, A') = 0. To ensure that m(A, A') = 0, in pUj) the following two conditions must 
be satisfied. 

1. No coefficient is positive. Otherwise, if A{Vi) — A'{Vi) > then (|10p would be maximized to a value 
not less than (A(Vi) — A'(Vi)), which is greater than 0. 

2. It is not the case that at least one coefficient is negative and the other coefficients are either negative 
or 0. Otherwise, by summing up all the coefficients, we would get 

A(S') - A'(S') < 

which contradicts the assumption that A and A' are distributions over S. 

Therefore the only possibility is that all coefficients in pO)) are 0, i.e., A(Vi) = A'(Vi) for any equivalence 
class Vi G S/ TZ. It follows from Theorem 12.41 (2) that A TZ^ A'. So we have shown that TZ is indeed a 
bisimulation. □ 

Corollary 6.10 Let s and t be two states in a finite state pLTS. Then s ^ t ii and only if mmax{s, t) = 0. 

Proof: {^) Since ^ is a bisimulation, by Theorem 16.91 there exists some state- metric m such that s ~ i iff 
m{s,t) — 0. By the definition of rrimax we have m < rrimax- Therefore rnmax{s,t) < •m{s^t) — 0. 
(<S=) From nimax we construct a pseudometric m as follows. 



m(s, t) 



if mmaxis,t) = 

1 otherwise. 



Since rrimax is a state-metric, it is easy to see that m is also a state-metric. Now we construct a binary 
relation TZ such that Vs, s' : sTZ s' \E m{s, s') = 0. If follows from Theorem 16.91 that 7?. is a bisimulation. If 
mmaxis,t) = 0, then m{s,t) = and thus sTZt. Therefore we have the required result s ^ t because ^ is 
the largest bisimulation. □ 



7 Algorithmic characterisation 

In this section we propose an "on the fly" algorithm for checking if two states in a finitary pLTS are bisimilar. 
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Algorithm 1 Check(A, 9, 7e) 



Input: A nonempty finite set S, distributions 

A, e G V{S) and TZC S x S 
Output: If A 7^■|■ e then "yes" else "no" 
Method: 

Construct the network J\f{A, 8, TV) 

Compute the maximum flow F in Af{A, O, TZ) 

If F < 1 then return "no" else "yes" . 



An important ingredient of the algorithm is to check if two distributions are related by a lifted relation. 
Fortunately, Theorem 13 . 71 already provides us a method for deciding whether ATZ^ Q, for two given distri- 
butions A, Q and a relation TZ. We construct the network Af{A, Q, TZ) and compute the maximum flow with 
well-known methods, as sketched in Algorithm 1. 

As shown in [3], computing the maximum flow in a network can be done in time 0(n'^/logn) and space 
0{'n?), where n is the number of nodes in the network. So we immediately have the following result. 

Lemma 7.1 The test whether ATZ'^ Q can be done in time 0(7i'^/logn) and space O(n^). □ 

We now present a bisimilarity-checking algorithm by adapting the algorithm proposed in [39, for value- 
passing processes, which in turn was inspired by |22j . 

The main procedure in the algorithm is Bisim(s,t). It starts with the initial state pair (s,t), trying to 
find the smallest bisimulation relation containing the pair by matching transitions from each pair of states 
it reaches. It uses three auxiliary data structures: 

• NotBisim collects all state pairs that have already been detected as not bisimilar. 

• Visited collects all state pairs that have already been visited. 

• Assumed collects all state pairs that have already been visited and assumed to be bisimilar. 

The core procedure. Match, is called from function Bis inside the main procedure Bisim. Whenever a new 
pair of states is encountered it is inserted into Visited. If two states fail to match each other's transitions 
then they are not bisimilar and the pair is added to NotBisim. If the current state pair has been visited 
before, we check whether it is in NotBisim. If this is the case, we return false. Otherwise, a loop has been 
detected and we make assumption that the two states are bisimilar, by inserting the pair into Assumed, 
and return true. Later on, if we find that the two states are not bisimilar after finishing searching the 
loop, then the assumption is wrong, so we first add the pair into NotBisim and then raise the exception 
W rong Assumption, which forces the function Bis to run again, with the new information that the two 
states in this pair are not bisimilar. In this case, the size of NotBisim has been increased by at least one. 
Hence, Bis can only be called for finitely many times. Therefore, the procedure Bisim(s,i) will terminate. 
If it returns true, then the set [Visited — NotBisim) constitutes a bisimulation relation containing the pair 
is,t). 

The main difference from the algorithm of checking non-probabilistic bisimilarity in (39j is the introduction 
of the procedure MatchDistribution(A, 0), where we approximate ^ by a binary relation TZ which is 
coarser than ^ in general, and we check the validity of A TZ'^ 9. If A Te^ 9 does not hold, then A e 
is invalid either and MatchDistribution(A, 9) returns false correctly. Otherwise, the two distributions A 
and 9 are considered equivalent with respect to TZ and we move on to match other pairs of distributions. 
The correctness of the algorithm is stated in the following theorem. 

Theorem 7.2 Given two states sq and to in a finitary pLTS, the function Bisim(so,io) terminates, and it 
returns true if and only if sq tg. 
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Algorithm 2 Bisim(s, t) 



Bisim(s,t) = { 

NotBisim := {} 
fun Bis(s,t)={ 

Visited := {} 

Assumed := {} 

Match(s,t)} 
} handle Wrong Assumption Bis(s, t) 
return Bis(s, t) 

Match(s, t) = 

Visited := VisistedU {{s,t)} 
b = AaeA MatchAction(s, t, a) 
if 6 = false then 

NotBisim := NotBisimU {{s,t)} 

if {s,t) e Assumed then 
raise WrongAssumption 

end if 
end if 
return b 

MatchAction(s, a) = 
for all s ^ A, do 

for all t do 

bij = MatchDistribution(Ai, 0j) 

end for 
end for 

return (Ai(V,- ^ij)) A(Aj(Vi 

MatchDistribution(A, 6) = 

Assume [A] = {si,...,s„} and [9] = {ti,...,tm} 

TZ:= {{si,tj) I Close{si,tj) = true} 
return Check(A, 6, 7^) 

Close(s,t) = 

if (s, t) € NotBisim then 

return /oZse 
else if (,s, t) e Visited then 

Assumed := Assumed U {(s, t)} 

return true 
else 

return Match(s, t) 
end if 
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Proof: Let Bis^ stand for the i-th execution of the function Bis. Let Assumedi and NotBisirrii be the 
set Assumed and NotBisim at the end of Bis^. When Bis^ is finished, either a Wrong Assumption is 
raised or no Wrong Assumption is raised. In the former case, Assumedi H NotBisimi ^ 0; in the latter 
case, the execution of the function Bisim is completed. From function Close we know that Assumedi n 
NotBisimi-i = 0. Now it follows from the simple fact NotBisimi-i C NotBisimi that NotBisimi^i C 
NotBisimi. Since we are considering finitary pLTSs, there is some j such that NotBisimj^i = NotBisimj, 
when all the non-bisimilar state pairs reachable from sq and to have been found and Bisim must terminate. 

For the correctness of the algorithm, we consider the relation TZi= Visitedi — NotBisimi, where Visitedi 
is the set Visited at the end of Bis^. Let Bis^ be the last execution of Bis. For each i < k, the relation 
TZi can be regarded as an approximation of ^, as far as the states appeared in TZi are concerned. Moreover, 
TZi is a coarser approximation because if two states s,t are re- visited but their relation is unknown, they 
are assumed to be bisimilar. Therefore, if Bisfc(so,to) returns false, then sq t'^ to. On the other hand, 
if Bisi;(so,io) returns true, then TZk constitutes a bisimulation relation containing the pair (so,^o)- This 
follows because Match(so, to) = true which basically means that whenever s TZk t and s -^4- A there exists 
some transition t 8 such that Check(A, 8, 7?.fc) — true, i.e. A TZl. 8. Indeed, this rules out the 
possibility that sq 7^ otherwise we would have so to by Proposition 14.41 that is sq 9^n to for some 
n > 0. The latter means that some transition sq A exists such that for all to 8 we have A Q, 
or symmetrically with the roles of sq and to exchanged, i.e. A and 8 can be distinguished at level n, so a 
contradiction arises. □ 

Below we consider the time and space complexities of the algorithm. 

Theorem 7.3 Let s and t be two states in a pLTS with n states in total. The function Bisim(s, t) terminates 
in time 0(n^/ log n) and space 0{n^). 

Proof: The number of state pairs is bounded by n^. In the worst case, each execution of the function 
Bis(s, t) only yields one new pair of states that are not bisimilar. The number of state pairs examined in the 
first execution of Bis(s,t) is at most O(n^), in the second execution is at most 0(n^ ~ 1): ' ' " • Therefore, 
the total number of state pairs examined is at most 0{n^ + (n^ — 1) + • • • + 1) = 0{n'^). When a state pair 
{s,t) is examined, each transition of s is compared with all transitions of t labelled with the same action. 
Since the pLTS is finitely branching, we could assume that each state has at most c outgoing transitions. 
Therefore, for each state pair, the number of comparisons of transitions is bound by c^. As a comparison of 
two transitions calls the function Check once, which requires time 0(n'^/logn) by Lemma mi As a result, 
examining each state pair takes time 0(c^n^/ log n). Finally, the worst case time complexity of executing 
Bisim(s,t) is 0(n^/ log n). 

The space requirement of the algorithm is easily seen to be O(n^), in view of Lemma l7.1l □ 

Remark 7.4 With mild modification, the above algorithm can be adapted to check probabilistic similarity. 
We simply remove the underlined part in the function MatchAction; the rest of the algorithm remains un- 
changed. Similar to the analysis in Theorems 17.21 and 1 7 . 3[ the new algorithm can be shown to correctly check 
probabilistic similarity over finitary pLTSs; its worst case time and space complexities are still OirJ / \ogn) 
and 0{n'^), respectively. 

8 Conclusion 

To define behavioural equivalences or preorders for probabilistic processes often involves a lifting operation 
that turns a binary relation TZ on states into a relation TZ'^ on distributions over states. We have shown that 
several different proposals for lifting relations can be reconciled. They are nothing more than different forms 
of essentially the same lifting operation. More interestingly, we have discovered that this lifting operation 
corresponds well to the Kantorovich metric, a fundamental concept used in mathematics to lift a metric 
on states to a metric on distributions over states, besides the fact the lifting operation is related to the 
maximum flow problem in optimisation theory. 
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The lifting operation leads to a neat notion of probabilistic bisimulation, for which we have provided 

logical, metric, and algorithmic characterisations. 

1. We have introduced a probabilistic choice modality to specify the behaviour of distributions of states. 
Adding the new modality to the Hennessy-Milner logic and the modal mu-calculus results in an ade- 
quate and an expressive logic w.r.t. probabilistic bisimilarity, respectively. 

2. Due to the correspondence of the lifting operation and the Kantorovich metric, bisimulations can 

be naturally characterised as pseudometrics which are post-fixed points of a monotone function, and 
bisimilarity as the greatest post-fixed point of the funciton. 

3. We have presented an "on the fly" algorithm to check if two states in a finitary pLTS are bisimilar. 
The algorithm is based on the close relationship between the lifting operation and the maximum flow 
problem. 

In the belief that a good scientific concept is often elegant, even seen from different perspectives, we consider 
the lifting operation and probabilistic bisimulation as two concepts in probabilistic concurrency theory that 
are formulated in the right way. 
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